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ABSTRACT 


During the past three decades, many organizations have seen a 
dramatic proliferation of a variety of information systems. 
Organizations soon discovered the need to access and share data 
across these different information systems. Under current 
technology, this integration is usually not possible due to the 
heterogeneity of information systems. One level of heterogeneity 
is that of semantics. The objective of this thesis is to builda 
framework for enumerating, classifying, and resolving the types of 
semantic heterogeneity that could exist in an object-oriented 
database model. The framework covers both schema and data content 
Somrdicts. The schema conflicts are classified broadly by the 
level at which they occur. The primary data conflicts covered 
include inconsistencies and different representations for the same 


data. 
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I. INTRODUCTION 


A. BACKGROUND 

During the past three decades, many organizations have 
seen a dramatic proliferation of a variety of information 
systems. While these information systems are useful in 
supporting their different activities, organizations soon 
discovered the need to access and share data across these 
different information systems. 

Under current technology, this integration is usually not 
possible due to the heterogeneity of information systems. 
This heterogeneity exists at three basic levels (Bertino, 
1989). The first is the information system level. Data is 
managed by a variety of information systems based on different 
data models and languages. The second level of heterogeneity 
is that of semantics. Since different information systems 
have been designed independently, semantic conflicts are 
likely to be present. This includes both schema (e.g., name, 
type conflicts) and data (@.g., inconsistencies) conflicts. 
Finally, the third level of heterogeneity is that of hardware, 
operating systems, and communications. 

Several approaches have been proposed to address the 
issues of integrating heterogeneous information systems 


(Sheth, 1990, pp.183-236). A common theme of these approaches 


is the need for a semantically rich integrating model to 
represent, resolve the conflicts of, and integrate the 
different component information systems. In this thesis the 
issues of identifying and resolving semantic conflicts by 
using a generic object-oriented data model as the integrating 


model are examined. 


B. OBJECTIVES 

The objective of this thesis is to build a framework for 
enumerating, classifying, and resolving the types of semantic 
heterogeneity that could exist in an object-oriented database 
model. The framework will cover both schema and data content 
conflicts. The schema conflicts are classified broadly by the 
level at which they occur. These levels are: object level 
conflicts, attribute level conflicts, object-attribute level 
conflicts, and object method conflicts. The primary data 
conflicts covered include inconsistencies and different 
representations for the same data. 

To accomplish the objective, a real world database 
scenario is presented, a generic object-oriented model is 


presented, and the conflict framework is proposed. 


C. RESEARCH QUESTIONS 
1. Can aoeworkable framework for classifying and 
enumerating schema and data heterogeneity conflicts in an 


object-oriented database models be developed? 


2. Can proposed solution guidelines to identified schema 
and data heterogeneity conflicts in object-oriented database 


models be developed? 


D. SCOPE AND LIMITATIONS 

This thesis will briefly describe the three levels of 
heterogeneity. It will then focus on building a framework for 
enumerating and classifying schema and data conflict in an 
object-oriented database model and propose a guideline for 
conflict resolutions. A similar framework is presented for 
use with a relational model. Information systems level, 
hardware, operating systems, and communications heterogeneity 


will not be addressed in this thesis. 


E. METHODOLOGY 

This research started with a literature review of 
pertinent topics including: object-oriented models, object- 
oriented databases, federated database systems, multidatabase 
systems, schematic and data heterogeneity issues, information 
systems proliferation issues, and specific Department of 
Defence and Department of the Navy information systems 
proliferation problems. The second step was to identify the 
generic object-oriented model used in this research. The 
third step was to identify a useable real world database 
scenario to use for research. The fourth step was the 


development of the proposed framework. The final step was the 


development of the guidelines for resolving the identified 


conflicts. 


F. ORGANIZATION OF THESIS 

The organization of the remainder of the thesis is as 
follows. Chapter II explains the background of the issue, 
presents a rationale of why the problems related to 
heterogeneity evolved, and explains the different types of 
heterogeneity in information systems. Chapter III overviews 
a framework for classifying schematic and data conflicts in 
a relational model. Chapter IV presents the main 
characteristics of the object-oriented model used in this 
research. Chapter V presents the database scenario used in 
this research. Chapter VI develops a framework for 
classifying the schematic and data conflicts of the object- 
oriented model presented in Chapter IV and uses the database 
scenario in Chapter V to illustrate conflict examples. 
Chapter VII presents guidelines for resolving the conflicts 
identified in Chapter VI. Chapter VIII concludes the paper 


with a summary and provides directions for future research. 


II. BACKGROUND ON HETEROGENEOUS DATABASES 


A. BACKGROUND 

Today, from administrative to operational commands, the 
use of computers to solve data manipulation problems is very 
common. This fact had its foundations with the widespread use 
of mainframe computers in the 1960s. In the military, these 
early computers were mostly associated with data and research 
centers. Initially, programs were developed for specialized 
applications that relied on large amounts of data typically 
stored on disks. The data was stored in flat file systems 
and uniquely addressed by the programs developed to use it. 
It was apparent that a large amount of information was common 
to different applications and that there was a need to share 
access to data (Parsaye, 1989, pp.36). 

As technology was introduced to organizations, the type of 
administrative control within the organization dictated how 
new technology was exploited. The DOD had a slack environment 
when database technology was introduced. Each branch of the 
service was allowed to operate independently with little or no 
Guidance. Within the services, major commands also acted 
independently. Though this lack of coordination led to many 


duplicate efforts and a lack of standardization, this approach 


had some benefits. To quote Richard Nolan (Nolan’s stages of 
GLoOweEnh)y; 


"The balance between control and slack is 
important in developing appropriate management 
approaches for each stage of an organizational 
learning. For example, an imbalance of high control 
and low slack in the earlier stages can impede the use 
of information technology in the organization; 
conversely, an imbalance of low control and high slack 
in the latter stages can lead to explosive data 
processing budget increases and inefficient systems." 
(Nolan, 1979, ep. 127) 


In the March-April 1979 Harvard Business Review Richard 


Nolan wrote an article "Managing the Crisis in Data 


Processing." In this article Nolan proposed six stages of 
growth. The first is the initiation stage where new 
technology is first introduced. The second stage is the 


contagion stage where proliferation of the technology begins. 
The third stage is the control stage where formalized planning 
and control are introduced. The fourth stage is the 
integration stage where plans are tailored to include all 
aspects of the organization. The fifth stage is the data 
administration stage where the organization has complete 
shared data and common systems. The sixth and final stage is 
maturity where data is used as a strategic resource. 
Examining database development in terms of Nolan’s stages 
of growth helps explain the proliferation of databases in the 
DOD. Putting this proliferation in terms of Nolan’s stages of 


growth, the widespread use of databases started in the 


contagion stage. In this stage, senior and middle managers 


became frustrated in their attempts to obtain information from 


centralized systems. This frustration led to proposals for 
more local databases. In DOD/DON, the initial emphasis on 
data centers generated end-user frustration. Data was 


supplied by the end-users, but access to that data was 
limited. To resolve this problem many end-users throughout 
the DOD/DON chain of command started to develop their own 
specialized databases. 

The databases that evolved were influenced by the data 
model that was in vogue at the time of development. Once 
these databases were populated, it was perceived as cheaper to 
maintain, rather than standardizing on one model or format. 
This was partly due to the view end-users took of their data. 
Data was viewed as proprietary, not a strategic asset of the 
entire DOD/DON. 

After the explosive growth of databases, the DOD/DON 
entered the control phase of Nolan’s growth model. Here the 
emphasis iS on reduced costs. Redundancies are agen as 
wasteful. This issue was discussed in appropriations testimony 
before the House of Representatives on the Corporate 
Information Management program. One example cited is the DOD 
payroll systems. Throughout the DOD there are 27 different 
Civilian payroll systems, each with an associated database. 
These systems range from 25 year old to state-of-the-art 


technology. As for the Navy, the DON alone had nine systems 


in use at the time of the testimony (DOD, 1991, pp.21-22). 
This example highlights some proliferation problems associated 
with the contagion stage of Nolan’ growth model. As we move 
further along in the control stage, many of these redundancies 
will be examined in detail. This is part of the on going 
Corporate Information Management (CIM) initiative. Often, 
redesigning and rebuilding systems from scratch to eliminate 
redundancies is not feasible. Designing systems that can 
access data already available is a more likely option. 
However, the requirement is a system that can access and share 
data across the existing heterogeneous databases. This 
process has defaulted to a manual one that combines numerous 
queries across the heterogeneous databases of interest. To 
avoid the inefficiencies created by this manual process, 
problems related to homogenizing heterogeneous databases must 
be resolved. To solve these problems requires an 
understanding of the different types of heterogeneity that 


exist in database applications. 


B. TYPES OF HETEROGENEITY 

Heterogeneity exists at three basic levels. The first is 
the information systems level. Data is managed by a variety 
of information systems based on different data models and 
languages (e.g., file systems, navigational database systems, 
relational database systems, etc.). The second level of 


heterogeneity is that of semantics. Since different 


information systems have been designed independently, semantic 
conflicts are likely to be present. This includes both schema 
(e.g., name, type conflicts) and data (e.g., inconsistencies) 
conflicts. Finally, the third level of heterogeneity is that 
of hardware, operating systems, and communications. The three 
levels of heterogeneity as it applies to database applications 
are discussed briefly in the following sections. 
1. Database Management Systems Heterogeneity 

The need to share large amounts of data led to the 
development of centralized databases and database management 
systems. The data was grouped by files of records. Fach 
record contained several attributes. Managing the files via 
a database consisted of three primary tasks, defining the data 
structure, developing a data manipulation language, and 
developing a data query language (Parsaye, 1989, pp.40). 

The data manipulation and query language depended on 
how the user perceives the data in the database. The three 
core models that evolved were the hierarchical, network, and 
relational models. All three of these data models are still 
in use. 

The hierarchical model is based on the concepts of a 
tree structure. Each node has branches that point to the 
children of that node. Every node has a parent except for the 
root node. Hierarchical databases often exhibit poor 


Flexibility, but have good performance. 


The network model is similar to the hierarchical 
model. However, it uses additional pointers so that links 
between any nodes can be created. CODASYL is a good example 
of a network model that developed out of the COBOL language 
(Gillenson, 1990, pp.256). Both the hierarchical and network 
models are considered navigational data models which get their 
power from storage and retrieval techniques. 

The relational model uses tables to view data. It is 
based on the concept that data is organized and stored in two- 
dimensional tables called relations. Each row in a table 
represents a record. Fach column represents a field. The 
entire table is roughly equivalent to a file (Kroenke, 1988, 
potas 2)2 

These three models represent the foundation of most 
database management systems (DBMS) in uSe today. Over time, 
the need for adding more semantics to the models was 
recognized. This led to the development of models that tried 
to capture more semantic information. Chief among these 
models was the Entity Relationship Model. An entity is a 
representation of a real world object. Fach entity has 
properties or attributes. Entities in a particular system 
have symbolically stated relationships. 

The latest data model is the object-oriented model. 
The object-oriented model uses objects to model the domain of 
interest. The objects have names, attributes, and methods 


associated with them. Object-oriented databases are gaining 
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in popularity and the use of the object model as an 
integrating data model in heterogeneous environments is the 
focus of this thesis. 

2. Data and Schematic Heterogeneity 

Since databases are developed independently with 
different designs, semantic conflicts are likely to occur. 
Semantic conflicts are classified as either schema or data 
Sonflicts. 

Schema conflicts occur when different structures or 
symbology is used to represent the same information, or when 
a Similar structure or symbology is used to represent 
different information. Schema conflicts include name and 
structure conflicts. Data conflicts are generally caused by 
failures to maintain a database or data entry error. These 
conflicts include violations of databases integrity 
constraints, the use of different representations for the same 
data, and inconsistent data. In the next chapter, we present 
an overview of schematic and data heterogeneity in relational 
databases. 

3. Platform Heterogeneity 

"Heterogeneous computing environments consist 

of dissimilar hardware or software systems. 

Because of the diversity, interconnecting 

systems is far more difficult in heterogeneous 

environments than in homogeneous environments 

where each system is based on the same or 


closely related, hardware and software." 
(WOpkin, .4507,, Dp. 41) 


TEAL 


Heterogeneity of hardware is often unavoidable. It 
occurs in DOD/DON through the acquisition process. As 
technology evolves, different types of hardware systems are 
developed that meet the specification of proposals which start 
the acquisition process. The DOD/DON traditionally goes for 
the least expensive system that meets the specification 
without regard to existing architecture (unless existing 
architecture is taken into account in the specification). 

The problems that arise due to hardware and software 
heterogeneity generally fall under one of the following 
general areas; interconnection, filing (data storage), 
authentication, naming, and user interfaces. The following 
paragraphs give a brief description of each problem. 

Interconnection problems deal with how dissimilar 
systems communicate. Two basic mechanisms for communication 
are message passing and remote procedure calls. Message 
passing consists of passing data asynchronously from one 
process to another. Remote procedure calls provide semantics 
across a network that are Similar to procedure calls ina 
Standard programming language. This type of communication is 
synchronous in nature. Either of these methods must work with 
a standard set of communication protocols such as TCP/IP. 

The filing problems center on the different data 
formats used by different computer architectures. An example 


would be one system using ANSI retrieving a file froma system 
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using EBCDIC or a system that uses 16 bit words retrieving a 
file from a system that uses 32 bit words. 

The authentication problems deal with the concerns of 
three broad problem areas: sources of distrust and diversity 
with respect to authentication; identifying the actual 
function of authentication and authorization; and 
accommodating the need for local autonomy within global 
authentication environments. 

The naming problems center on the naming scheme 
adopted for files or applications. Names come in two types, 
relative and absolute. An absolute name refers to the same 
object regardless of its context. This facilitates sharing 
Since a common vocabulary would be implied. A relative name 
is context dependant. Relative naming has greater utility. 
Another problem related to naming is the choice of a single 
Global homogenous name space, or many local name spaces. The 
choice of naming scheme will have a design impact on the 
development of any multidatabase system. 

The final problem area deals with the user interface. 
Mark Weisner of the University of Maryland defined four levels 
of user interface heterogeneity; (1) what the user sees, (2) 
what the application sees and provides, (3) what the window 
System sees and provides, and (4) what the hardware provides 


(Notkin, 1987, pp.48-49). 
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C. PROPOSED SOLUTIONS 

There are two general approaches for providing integrated 
access to a collection of heterogeneous databases. They are 
the multidatabase or composite approach and the federated 
approach. 

1. The Multidatabase or Composite Approach 

The multidatabase or composite approach relies ona 
global schema. The global schema provides a description of 
the information in the heterogeneous composite databases and 
make up a logically single, integrated database. Access and 
manipulation operations are expressed in a universal query 
language and mediated through the global schema. This format 
provides the user with the illusion of a centralized database. 
(Collet, 1991, pp250) 

Construction of a global schema is a difficult 
process. The main reason is the lack of a general solution 
for the semantic conflicts in a Situation in which the 
autonomy of each of the constituent databases is preserved 
(Litwin, ) 1986) {pia213)— Furthermore, the process must be 
repeated every time a composite database schema changes or 
another composite database is added to the system. 

The users are not required to know what semantic 
conflicts exist among the composite databases. However the 
developers must provide explicit resolutions for the conflicts 


before actual system use. In essence a centralized view of 
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all the composite databases is developed. This centralized or 
virtual view may be different from the local views of the 
composite databases. The view discrepancy can cause problems 
in the execution of existing applications. 

2. The Federated Approach 

By contrast with composite or multidatabase systems, 
the federated database uses an organization model based on 
equal, autonomous databases, with sharing controlled by 
explicit interfaces (Heimbigner, 1985, pp.48). The user is 
shown a collection of local views along with tools for 
information sharing among the composite databases. ial 
essence, a virtual global schema is created. 

Federated databases try to minimize central authority, 
yet support partial sharing and coordination among composite 
databases. Without the constraint of a central authority the 
federated system tries to maintain as much composite database 
autonomy as possible and still support strong information 
sharing. 

To facilitate the conflicting requirements of autonomy 
and data sharing, the federated architecture relies on three 
component schemas: private schema, export schema, and import 
schema. The private schema is the schema that describes a 
composite database and is stored at the location of the 
composite database. The export schema is the portion of the 


schema that a particular composite database is willing to 
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share. The import schema specifies the information that 
composite databases desire to use from other composite 
databases. (Heimbigner, 1985, pp.54) 

Negotiation is another key feature of the federated 
architecture. This system is conceptually made up of two 
parts; an interpreter, and a collection of procedures written 
in the negotiator’s language. This negotiation aspect is 


where most of the heterogeneous conflicts are resolved. 


D. INTEGRATING MODEL REQUIREMENTS 

Either approach requires a strong integrating model that 
1s semantically rich enough to subsume the component 
databases. The composite oor multidatabase needs a 
semantically rich model to build an all encompassing global 
schema. The federated model needs a semantically rich model 
to supply the needs of its negotiator. This thesis uses the 
object oriented model as the integrating model and develops a 
framework for representing the semantic heterogeneity for this 


model. 
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III. SCHEMA AND DATA CONFLICTS IN THE RELATIONAL MODEL 


This chapter is a synopsis of an article by Won Kim and 
Jungyun Seo from the December 1991 issue of Computer magazine 
called "Classifying Schematic and Data Heterogeneity in 
Multidatabase Systems." The article developed a complete 
framework for enumerating and classifying the types of 
multidatabase system SEructural and representational 
discrepancies. 

When viewed in a relational sense, the schema conflicts 
can be categorized in three main area’s: table-versus-table 
conflicts, attribute-versus-attribute conflicts, and table- 
versus-attribute conflicts. Fach of these can be further 
broken down. We will look at each schema conflict subarea 


separately, and then discuss the data conflicts. 


A. TABLE-VERSUS-TABLE CONFLICTS 

The table-versus-table conflicts occur when different 
databases use different definitions to represent Similar 
information in tables. Table versus table conflicts can be 
categorized as one-to-one and many-to-many table conflicts. 
One-to-one table conflicts occur when the different databases 
represent Similar information using different names, 


structures, and constraints in single tables. The table name 
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conflicts arise when different names are used in different 
databases to represent semantically equivalent tables. An 
example would be one table named "document" that describes 
real world paper-media documents, and another database with a 
table "publication" that describes the same object. A second 
version of this conflict occurs when different databases use 
the same table name to represent semantically different 
tables. Going back to the document example, we compare this 
to another database that has a table named "document", yet the 
attributes describe those of a file on a harddrive in a 
computer versus a paper media. The table structure conflicts 
occur when different databases have similar tables, yet the 
numbers of attributes in the tables differ. The table 
constraint problem arises from differences in the 
specifications of the tables in the different databases. 
These problems are associated with the use of primary, 
candidate, and foreign keys. If an attribute is a key in one 
database, but the corresponding attribute in the table of 
another database is not a key, it is difficult to impose 
constraints on this attribute at a homogenizing level. Thus, 
making updates based on a key with a conflict is difficult. 
Many-to-many table conflicts occur when different 
Gatabases use a different number of tables to represent the 
same information. This type of conflict can usually be 


decomposed into one-to-one table conflicts. 
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B. ATTRIBUTE-VERSUS-ATTRIBUTE CONFLICTS 

The attribute-versus-attribute conflicts occur when 
different definitions for semantically equivalent attributes 
exist in different databases. Like table conflicts, one-to- 
one and many-to-many attribute conflicts exist. Many-to-many 
conflict, however, can be decomposed into one-to-one 
conflicts. The attribute-versus-attribute conflicts can be 
categorized as attribute name conflicts, default value 
conflicts, and attribute constraint conflicts. 

Attribute name conflicts arise from using different names 
for semantically equivalent attributes in different databases 
or when the same attribute name is used for semantically 
Gdifferent attributes. This is very similar to the table name 
conflicts described earlier. 

Attribute default value conflicts arise when one database 
enters a null when no attribute value is entered, while 
another database enters another default value when no value 
for the same attribute is entered. 

Attribute constraint conflicts fall under two types; data 
type conflicts and attribute integrity-constraint conflicts. 
Data type conflicts occur when semantically equivalent 
attributes in different databases have different data types. 
An example would be an attribute representing a _ social 
security number stored as a numeric type in one database, and 
as a character type in another database. Attribute integrity- 


Semstraints conflicts are similar to default value conflicts. 
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Specifically, they deal with the field size of an attribute. 
An example would be the attribute weight in one database being 
defined as an integer less than 999, while the same ettribute 
is defined as an integer less than 9999 in another database. 
This would cause a problem in homogenizing the two databases 
when a four digit value is entered. It would work for one 


Gatabase, but not the other. 


C. TABLE-VERSUS-ATTRIBUTE CONFLICTS 

The third category is table-versus-attribute conflicts. 
These conflicts occur if one database uses tables while 
another uses attributes to represent the same information. 
Often this conflict type can be regarded as a combination of 
many-to-many table conflicts and many-to-many attribute 


conflicts. 


D. DATA CONFLICTS 

In relational models the data conflicts fall into two 
subcategories: data conflict that violate specified integrity 
constraints and conflicts based on different representations 
for the same data. The first problem can be expressed as 
wrong data. This is generally caused by a failure to maintain 
a database or a failure to enforce integrity constraints. We 
see this problem when equivalent attributes of different 
databases are expected to have the same value, yet the values 


are different. Another common cause of this problem is 
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obsolete data. This can be seen when two databases have 
Similar data, yet one has more frequent update periods. An 
example would be two Similar databases that track individual 
flight hours. One is updated weekly and the other is updated 
monthly. If each database was queried for LT Smith’s flight 
hours, the results are likely to be different. 

The second type of data Come 1 Ct, different 


representations for the same data can actually be viewed in 


three different aspects. The first deals with different 
expressions. This occurs when the same type of data has 
different expressions in different databases. An example 


would be listing LT Smith’s rank as "LT" in one database and 
"0-3" in another. In USN terms both mean the same thing. The 
second aspect deals with different units. These conflicts 
arise when different databases use different units for similar 
numeric data. An example of this would be a flight time 
database that uses minutes as the measurement of flight time, 
while another flight hour database that uses hours and partial 
hours to record the flight time. The third aspect is 
different precision. Precision conflicts occur when two 
Similar databases use values from domains of different 
cardinalities for the same attribute. An example would be one 
data base that uses light, medium, and heavy to describe the 
weight of an aircraft, while another uses a numeric range of 


100 to 200,000 pounds to describe weight. Figure 1 is a 
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synopsis of the schema and data conflict classifications. (Kim, 


1991, pp.12-18) 


E. CONCLUSION 

The objective of this thesis is to develop a similar 
framework for classifying schematic and data conflicts in an 
object-oriented model. The next chapter introduces the 


object-model used in support of this endeavor. 
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Figure 1 Schema and Data Heterogeneity Conflicts in 
Relational Models 
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IV. THE OBJECT-ORIENTED MODEL 


The use of an object-oriented model gives us richer 
semantics and greater modeling power over alternate 
approaches. Additionally, an object-oriented model is an 


ideal integration model for combining heterogeneous databases. 


A. MANAGING COMPLEXITY 

An object-oriented model is used for representing and 
managing complexity in a problem domain. Although there is no 
general consensus on what constitutes an object-oriented 
model, there are some agreed-upon characteristics that give an 


object-oriented model its semantic richness(Brown, 1991, 


pp.20). These characteristics include data and procedural 
abstractions, encapsulation, inheritance, associations, 
communication via method connections, and funct 16m 
overloading. 


1. Abstraction 
There are two types of abstraction, procedural and 
data. Procedural abstraction is the principle that any 
operation that achieves a well defined effect can be treated 
by its users as a simple entity, despite the fact that the 
operation may actually be achieved by some sequence of lower- 


level operations. Data abstraction is the principle of 
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defining a data type in terms of the operations that apply to 
the object with the constraint that the values of such objects 
can be modified and observed only by the use of the operations 
(Goad, 1991, pp.14). 
2. Encapsulation 

Encapsulation is a facility that serves to protect 
some part of a program or data against improper access. 
Central to the object-oriented model is the concept that the 
entities of interest in the real world can be modeled most 
effectively by representing each real-world entity as an 
object in the model. The definition of such an object 
includes both the data properties of that object and the 
operators which are permitted to manipulate that object. The 
essence of encapsulation is that such operators form an 
interface to objects which provide the only way to amend the 
state of the objects. The user of an object has no way to 
access that object other than through the defined set of 
Sperators (Brown, 1991, pp.i9). Encapsulation is often used 
to enforce information hiding. The power of encapsulation is 
that it keeps related content together. 

3. Inheritance 

Inheritance is a mechanism for expressing similarity 
among classes, and simplifying definitions of classes similar 
to those previously defined. In general, we find that a 


Subclass hierarchy can be defined in which a subclass is a 
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specialization of its superclass in the hierarchy. An 
important aspect of this specialization is that we do not need 
to define each subclass from scratch. We think of a subclass 
as inheriting the behavior of its superclass (Brown, 1991, 
Pee). This inheritance portrays generalization and 
specialization making common attributes and functions explicit 
within class hierarchy. Inheritance allows for the explicit 
expression of commonality. (Coad, 1991, pp.15) 
4. Association 

Association is the ability to tie together certain 
things that happen at some point in time or under similar 
circumstances. In constructing any type of information model, 
we are concerned with identifying associations between things 
in the real world and reflecting those associations as 
precisely stated relationships in the model (Shlaer, 1988, 
pp.47). To have an association is to have some logical 
connection. 

5. Communication and Method Overloading 

Communication with messages iS a principle for 
managing complexity, especially for interfacing different 
objects. This communication takes the form of producing 
functions (or methods). Message data is passed to an object, 
the data in the message causes reactions with the object. 
These reactions can be thought of as methods. Methods with 


the same name can cause different reactions depending on the 


26 


amount and type of data supplied in the message. This allows 


for method overloading. 


B. THE BUILDING BLOCKS 
1. Class and Objects&Class 

An object is an abstraction of something in a problem 
domain, reflecting the capabilities of a system to keep 
information about it, interact with it, or both (Coad, 1991, 
pp.53). Another way of looking at an object, is that it is an 
encapsulation envelope. It encapsulates knowledge in the form 
of attribute values and exclusive methods that can be 
performed with or on the object. ? 

Class is a description of one or more objects with a 
uniform set of attributes and methods, including a description 
of how to create new objects in the class. A class of objects 
contain common traits or attributes and have the same 
behavior. Figure 2 shows the symbology used to represent an 
object&class and a class. 

The top part in either object&class or the class 
symbol contains the name of the object or class. This name is 
a noun that describes the basic concept of the object. fThe 
middle area of the objecté&class or class symbol contains the 


attributes of the object&class or class. The bottom area of 


1 The object model used in this paper is based on the 
model proposed by Coad and Yourdon in "Object-Oriented 
Analysis", Yourdon Press, 1991 
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Figure 2 Class and Objecté&Class 
the symbol contains a list of the methods associated with the 
objecté&class or class. 

2. Structure 

The structure is a symbolic expression of the problem 
domain complexity, pertinent to the systems responsibility 
(Coad, 1991, pp.78). It indicates the relationships among the 
object&classes and classes. In this model two types of 
structure exist. Generalization-specialization structure and 
whole-part structure. 

Generalization-specialization structure is used to 
distinguish between similar but not identical classes. The 
attributes and methods germane to the actual class are 
inherited in the specialization class. The generalization- 


specialization structure allows for a method of organization 
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that implies inheritance from generalization class to 
specialization class and allows for an explicit representation 
of more attributes and methods pertinent to the specialization 


class. This structure notation is shown in Figure 3. 


GENERAL | ZAT ION 


SPECIAL I ZAT 1IOQON1 SPECIAL 1 ZAT ION2 








Figure 3 Generalization-Specialization Hierarchy Structure 

Their are two types of generalization-specialization 
structure forms. The structure forms are either a hierarchy 
(as depicted in Figure 3) or a lattice. Though the hierarchy 
form is the most common, the lattice structure can capture 
more information. Specifically, the lattice structure can 
highlight additional specializations and explicitly capture 
commonality while only modestly increasing model complexity 
(Coad, 1991, pp.89). This notation is depicted in Figure 4. 

The whole-part structure is based on a basic method of 
organization. It groups a whole object with the parts of that 


object. An example of this would be a whole object called 
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Figure 4 Generalization-Specialization Lattice Structure 





ship, associated with a number of part objects like 
propulsion-plant and cargo. The notation to represent a 
whole-part structure displays directionality and explicitly 
the number of parts related to the whole. The notation is 
depicted in Figure 5. 

The term multiple structures is used to describe 
combinations of general-specialization and whole-part 
structures. The essence of structure is that structure is an 
expression of problem-domain complexity pertinent to the 
system’s responsibility. Structure is used as an overall term 
describing both generalization-specialization and whole-part 


structures (Coad, 1991, pp.99). 
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Figure 5 Whole-Part Structure 


3. Attributes 

Attributes add detail to the class, object&class and 
structure abstractions. An attribute is some data or store 
information for which each object in a class has its own 
value. Attributes may only be changed by exclusive methods. 
If another part of the system needs to manipulate an attribute 
of an object, it must specify a message connection that 
contains information to a method defined by the object (Coad, 
eo, pp.120). 

Fach attribute of an object must capture a complete 
compact concept. This concept must be important to the 
problem domain. Making each attribute a complete compact 
concept reduces the number of attributes that must be included 


in the object. This leads to a simpler model for review. 


Sel 


Attributes of a generalized class in a generalization- 
specialization structure also apply to the specialization 
objects of that generalized class. Determining were 
attributes should be placed in a structure is an important 
part of determining the generalization-specialization 
structure. 

4. Instance Connections 

Instance connections model associations. An instance 
connection models the problem domain mapping that one object 
needs with other objects in order to fulfill its 
responsibilities. These can be one-to-one instance 
connections causing a mandatory association between one object 
and another, or optional association, or mandatory one way but 
optional in the other direction. One-to-many, or range of 
possibilities, is also captured in instance connection 
symbolegy.(Gead, 1991, ppeilas) An instance connection is 
modeled in Figure 6. 

5. Methods 

A method is a specific process that uses data from an 
ebyJece. Up to now we have discussed how to model data. 
Methods are how we model processes. 

Each object exists in different states. The state of 
an object is reflected by the values of it’s attributes. 
Methods are the processes that change the values of the 


attributes. This implies that knowledge about the state of an 
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Figure 6 Instance Connection 


object is important when examining methods. 

The methods of a system can be divided into two main 
types; algorithmically-simple methods and algorithmically- 
complex methods. Algorithmically-simple methods apply to each 
class and object in the model. They are create, connect, 
access, and release methods. The create method is used to 
create and initialize a new object ina class. The connect 
method connects or disconnects an object with another object. 
The access method sets the attribute values of an object. The 
release method deletes an object. The algorithmically-complex 
methods fall into two categories. The calculate category 
methods use attribute values to calculate specific results. 
The monitor category methods monitor external systems or 


Bevises. 
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6. Message Connections 
Message connections are the means of connecting object 
to facilitate methods. These connections exist solely for the 
benefit of the methods. Each message connection represents 
values sent within the context of a particular method and a 
response as a result of that method (Coad, 1991, pp.155). The 


notation for a message connection is a dashed arrow connecting 


objects or a class to objects as shown in Figure 7. 
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Figure 7 Message Connection 


Additionally, one message connection sender object can 
send a message to multiple receiver objects. The values sent 
in the message connection invoke methods in each object that 


receive the connection. The annotation is shown in Figure 8. 
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Figure 8 Multiple Message Connections 


C. CONCLUSION 

This chapter explained the tools used to analyze the 
problem domain in the framework of object-oriented analysis. 
The notations used were adopted from "Object-Oriented 
Analysis" by Peter Coad and Edward Yourdon. These tools give 
us a strong modeling power, and support the semantics needed 
to represent the semantics of many data models. These 
concepts and tools are used in the next chapter to transform 
schemas of databases based on several data models into 
equivalent schemas in an object-oriented model in preparation 


for integration into a global schema. 
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V. HETEROGENEOUS DATABASE SCENARIO 


A. BACKGROUND 

The database scenario used in this thesis comes from a 
U.S. Navy Maritime Patrol Training Squadron. The squadron is 
the east coast training squadron for all P-3C maritime patrol 
aircraft aircrew positions. We examine four heterogeneous 
databases that have been independently developed to support 
various applications of the squadron. Specifically, the 
databases include a library database for classified material, 
an aircrew physiology database, a Naval Air MTraining, 
Operations and Standardization (NATOPS) department database, 
and a flight schedule database. 

These databases were developed by different people and at 
different times using different data models. They have never 
been standardized in terms of data elements. All of them run 
on personal computers. They were developed with either "Q and 
A", "Dbase III plus", or "Enable". The command has interest 
in developing more databases, but lacks the expertise to 
design and implement an overall database application that 
covers all of their needs. 

Fach of the databases being examined has a specific 
purpose. The library for classified material database 


maintains data on all classified material used for 
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instruction. The flight physiology database tracks the status 
of student and staff aircrewmen in regards to physiology, 
survival, and other expiring qualifications. The NATOPS 
department database tracks NATOPS qualifications and NATOPS 
publications issued to all aircrewmen. The flight schedule 
database is used to promulgate a daily flight and ground 
training events’ schedule. Frequently, queries that span 
across these databases need to be answered. Currently, this 
is accomplished manually through a tedious procedure. First, 
the database that contain the data to be accessed are 
identified. Second, several queries in different languages 
are formulated and executed on the different databases. 
Third, the results are transferred to the requesting site, 
combined, and the requested information extracted and 
formatted. Additionally, considerable overlap occurs among 
the four databases. 

To allow queries that span several databases, a federated 
approach is’ suggested. With this approach, each local 
database is considered a logical component in the federation 
(Heimberger, 1985, pp.48). The components are tied together 
by explicit interfaces that form a virtual global schema that 
represent the integration of the local schemas. To accomplish 
this several steps are necessary. First, each local schema is 
transformed into an equivalent schema in a semantically rich 
common data model. Second, schema conflicts need to be 


identified and resolved. Third, the local schemas in the 
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common data model are merged to form a virtual global schema. 
Fourth, an additional control component, known as the global 
controller, is required. The global controller maintains the 
definition of the virtual global schema and acts as a 
coordinator and translator: it receives a global query, 
possibly in a user specific language; translates it into an 
equivalent query on a common-model global schema; decomposes 
and translates the common-model query into subqueries to the 
corresponding local database sites for processing; collects 
the results; identifies and resolves data content conflicts; 
reformats the results; and sends it back to the originating 


site. 


B. THE LIBRARY FOR CLASSIFIED MATERIAL DATABASE 

The library database contains the data necessary to track 
classified document that are issued to students and 
instructors while under a course of instruction at the 
training squadron. The data is grouped by different 
components: the library data includes the name of the library 
and the custodian; document data includes the publication 
name, document number, classification, status (checked in or 
out), and if status is checked out the social security number 
of the document holder; student data includes name, social 
security number, locker number, secret folder numbers, class 


number, and crew number; staff data includes name, social 
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security number, locker number, secret folder numbers, safe 
number, and crew number. 

Each document has a serial number; each student has an 
assigned locker and secret folder number; and each instructor 
has a secret folder number. Documents classified below the 
secret level are issued to students. The students store them 
in their confidential lockers when not in use. Secret 
documents are issued to students, but stored in student secret 
folder in the IML vault. The students check out their folders 
when they need documents for class, study, or flights. 
Instructors can check out confidential documents and store 
them in approved safes. Additionally, they can check out 
secret documents and store them in approved safe or use a 
secret folder in IML. The choice for instructors comes down 
to a matter of convenience; however, all applicable security 
precautions apply. 

The IML staff conducts a daily inventory of all secret 
material. This is conducted at the end of the normal work 
day. This inventory includes all secret material not issued, 
all student secret material, and all instructor material 
stored in IML’s vault. Instructor material stored in 
individual safes are periodically inventoried. All other 
student material is inventoried upon check-in and check-out. 
This occurs every six weeks. 

Problems that arise are uSually related to the flight 


schedule. Often, an individual has material signed out, and 
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is on a flight or trainer that is scheduled to land or finish 

after normal working hours. This material is stored in a 
separate safe and inventoried the next morning. 
1. Classified Library Relationship Diagram 

The classified library relationship diagram is 


depicted in Figure 9. 
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Figure 9 Library of Classified Material Relational Diagram 
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2. Classified Library Data Dictionary 


NAME 

LibName 
Custodian 
Publication 
DocNum 
Status 
Classification 
Name 

Ss 

LockerNum 
SecFolderNum 
ClassNum 
CrewNum 
OfficeNum 


SafeNum 


TYPE (Length) 
Character (30) 
Character (35) 
Character (50) 
Numerical (5) 
Character (4) 
Character (6) 
Character (25) 
Numeric (9) 
Numeric (3) 
Numeric (3) 
Numeric (4) 
Character (2) 
Character (3) 


Character (3) 


3. Transformation Process 


Description 

Name of Library 

Name of Custodian 
Title of Document 
Assigned Serial Number 
I Gre Gible, G6 Ibe ones’, 
Conf, Secret, or None 
Name and Rank of Person 
Social Security Number 
Assigned Locker Number 
Assigned Secret Folder 
Assigned Class Number 
Assigned Crew Number 
Office Number 


Safe Number 


The transformation process is started by examining 


each table to see if it could be modeled as an object. Most 


objects are either tangible things, roles, incidents, 


interactions, or specifications (Shlaer, 1988, pp.14). 


The easiest objects to identify are the tangible 


things. Library, document, and document holder fit in this 
category. Student and staff also fit, but they are specific 
types ene document holders. They are modeled -= as 
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specialization objects to the generalization object document 
holder. The Document-DocHolder relationship is not an object 
but conveys a necessary relationship that must be modeled. 
After determining objects, we list attributes 
associated with each one. Next, we examine the application to 
determine the methods associated within each object. The 
implicit methods of add, edit, and delete are not modeled. 
They are implied in the object class structure. The methods 
that must be modeled are library inventory, custody reports, 
individual inventory, check-in and check-out. 
4. The Classified Library Object Model 
The classified library object model is depicted in 


Figure 10. 
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Figure 10 Classified Library Object Model 
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C. THE FLIGHT PHYSIOLOGY DATABASE 

The flight physiology database consists of data on 
officers, enlisted personnel, and aircrew qualification 
requirements. The data is grouped by the different 
components: officer data includes name, rank, social security 
number, birth month, and designator (pilot or flight officer) ; 
enlisted data includes name, rate, birth month, and social 
security number; requirement data includes type of requirement 
and the date it is due. 

Every naval aircrew member must have certain expiring 
Qualification to continue flying. These include flight 
physicals, basic survival swim qualifications, advanced water 
Survival (DWEST), flight physiology training, instrument 
Qualifications, SERE, and NATOPS qualifications. The aircrew 
position determines which events are required and how often. 
If a required qualification lapses, the particular aircrew 
member is considered in a down status until that qualification 
is obtained. While in a down status he cannot perform his 
normal aircrew duties. 

1. Flight Physiology Relational Diagram 

The flight physiology relational diagram is depicted 
mieragure 11. 

2. Flight Physiology Data Dictionary 

NAME TYPE (Length) Description 


LastName Character (25) Last Name of Individual 
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Figure 11 Flight Physiology Relational Diagram 


FirstName Character (25) First Name of Individual 
MI Character (1) Middle Initial 

Rank Character (5) Rank 

Bmonth Character (3) Bigen Month 

Designator Character (4) Designator of Officer 
Soc Character (9) Social Security Number 
Name Character (15) Type of Qualification 
DueDate Date Date Qual Complete 

Rate Character (7) Rate of Enlisted 
Crewnum Character (4) Number of assigned crew 


44 


3. Transformation Process 

All of the tables can be transformed to objects, 
however, officer and enlisted share a number of attributes. 
This Similarity is captured by using a generalization- 
specialization structure. Creating a class to capture the 
Similar attributes is used. This class is_ labeled 
"Sservicemember" and has the attributes lastname, firstname, 
mi, and soc. The specialization objects of this class are 
"enlisted" and "officer". 

The methods are derived from the applications of the 
database. The database is used for planning inputs to a 
master schedule and to notify individuals of expiring 
qualifications. These methods are listed as "planninglist" 
eae "notification". 

4. The Flight Physiology Object Model 
The flight physiology object model is depicted in 


Figure 12. 


D. THE NATOPS DEPARTMENT DATABASE 

The NATOPS department database consists of data on 
officers, enlisted personnel, and NATOPS qualification 
requirements. The data is group by the different components: 
aircrew data includes name, rank, social security number, 
position, and crew number; test and check flight data includes 
type of test or check flight, date of item, test or flight 


administer, and score; publication data includes name, number, 
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Figure 12 Flight Physiology Object Model 


and current change number. 

This database is used to track the NATOPS qualification 
progress of aircrew. It keeps track of open and closed book 
test scores, oral exam scores, and flight grades. Each 
aircrew student is associated with an aircrew position. That 
position is associated with required tests, oral exam and 
flights, Additionally, the department tracks NATOPS 
publications issued to all squadron aircrew. 

Two primary applications are associated with this 
database. In the first application, publications are tracked 


for accountability. Additionally the database assists in 
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recall purposes when updates to publications are required. 
The second application is generating a 90 day planning tool 
where all personnel needing renewed or initial NATOPS 
qualifications are tracked 90 days before their due date. This 
tool is used as an input to a monthly planning calendar. 
1. NATOPS Department Relational diagram 
The NATOPS department relational diagram is depicted 


in Figure 13. 
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Figure 13 NATOPS Department Relational Diagram 





2. NATOPS Department Data Dictionary 
NAME TYPE (Length) Description 


LastName Character (25) Last Name of Individual 
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FirstName Character (25) First Name of Individual 
Grade Character (3) Paygrade 
Position Character (3) Crew position 
Socsec Character (9) Social Security Number 
Type Character (6) Open, Closed etc., 
Date Date Date Obtained 
Issuedby Character (25) Name 
Score Numeric (4) Numeric score obtained 
PubName Character (35) Name of Publication 
PubNum Character (6) Serial Number 
ChangeNum Character (4) Latest change entered 
BC Logical Blue Card Holder Y/N 
CrewNum Character (4) Crew Number 

3. Transformation Process 


All of the tables can be modeled as objects. 
Additionally, none of the tables contain Similarities. So, to 
make the transformation each table is transformed into an 
object. 

In determining the methods we examine the applications 
that access this database. One primary application is 
Maintaining a publication inventory list. The second 
application is generating a 90 day planning input for NATOPS 


qualifications that expire in the next 90 days. Additionally 


a third application is sending out change notices for 
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publications, and tracking change entries into applicable 
NATOPS related publications. Finally the last application is 
tracking individual performance. 
4. NATOPS Department Object Model 
The NATOPS department object model is depicted in 


Figure 14. 
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Figure 14 NATOPS Department Object Model 


E. THE FLIGHT SCHEDULE DATABASE 

The flight schedule database consists of data on flight 
events, simulator events, required events, staff crews, 
student crews, and aircrew added to or subtracted from an 
event. The data is group by the different components: flight 


event data includes event number, aircraft, preflight time, 
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take off time, land time, staff crew number and student crew 
number; Simulator event data includes event number, simulator 
number, lab number, student crew number, staff crew number, 
brief time, and end time; required event data includes event 
number, event name, room number, start time, and end time; 
staff crew data includes staff crew number, and the various 
staff personnel assigned to that crew; student crew data 
includes the student crew number and the various student 
assigned to that crew. 

The flight schedule database is used as a planning and 
execution tool to promulgate a daily flight schedule. Events 
are the primary focus of the database. These events are 
either ground training events, simulator events, flight 
events, or administrative events. Each event is given a time 
block. Additionally, the required assets and personnel are 
identified for each event. 

The primary application is to ensure that required 
training is accomplished without double scheduling personnel. 
Assets may or may not be double scheduled depending on the 
event. In terms of assets, it is possible to double schedule 
most ground training events, but assets for flights and 
Simulators cannot be double scheduled. Administrative events 
cannot be double scheduled unless they are of a large meeting 
type. An example would be an all officers meeting. This 


would apply to all officers not otherwise scheduled. 
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1. Flight Schedule Relational Diagram 
The flight schedule relational diagram is depicted in 


Figure 15 and 16. 
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Figure 15 Flight Schedule Relational Diagram 





on 


Staf f Crew 


Staff PPC Stat 1 NEO 
Starfsot 


Student Crew 


SEU We STUPIL2 SPUN STUNC StTUSsa hse 







StuPi lot 


St UAW \\ 


St uNFO 





Figure 16 Flight Schedule Diagram Part 2 
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2. Flight Schedule Data Dictionary 


NAME 
EventNum 
iyo 
A/C 

Pee 
epmie 
Land 
StfCrew 
StuCrew 
PPCNum 
TCNum 
NCNum 
SS1iNum 
Position 
MemNum 
STUPIL1 
SlOPIL2 
S2urc 
STUNC 
SsiUssi 
SimNum 
LabNum 
Brief 


End 


TYPE (Length) 
Numeric (2) 
Character (6) 
Character (5) 
Numeric (4) 
Numeric (4) 
Numeric (4) 
Numeric (2) 
Numeric (2) 
Numeric (3) 
Numeric (3) 
Numeric (3) 
Numeric (3) 
Character (4) 
Numeric (3) 
Numeric (3) 
Numeric (3) 
Numeric (3) 
Numeric (3) 
Numeric (3) 
Character (5) 
Character (5) 
Numeric (4) 


Numeric (4) 


Description 

Event Number 

Flight or Sim 
Aircraft Side Number 
Preflight Time 
Takeoff Time 

Land Time 

Staf£ Crew Number 
Student Crew Number 
Pilot ID Number 

NFO ID Number 

NFO ID Number 

AW ID Number 
Position Code (PPC,TC,.. 
NFOMEItee,. . . SPP e Number 
Student Pilot ID Number 
Student Pilot ID Number 
Student NFO ID Number 
Student NFO ID Number 
Student AW ID Number 
Simulator ID Number 
Assigned Lab Number 
Brief Time 


Session End Time 


Rank Character. Rank or Rate of Person 


Name Charactemat3 5} Name of Person 
StuAWNum Numeric (3) Enlisted AW ID Number 
StuPCNum Numeric (3) Student Pilot ID Number 
StuNFONum Numeric (3) Student NFO ID Number 
Room Character (4) Room for Admin Event 
Title Character (5) Name of Admin Event 

3. Transformation Process 


This iS a more complex structure than the previous 
databases. To capture this complexity we use a number of 
whole-part Structures and generalization-specialization 
structures. Starting with the fltschedule table, we transform 
this into an object with parts flight and simulator. 
Additionally, both staffcrew and studentcrew are treated as 
parts of simulator and flight. Likewise, plusmember and 
minusmember are treated as objects with connections to flight 


and simulator. 


Some of the tables have similar attributes. Le 
capture this, we use a generalization-specializations 
structure. We start by building a generalization class of 


identical attributes from the aircrew position related tables. 
We then add a number of specialization class-objects to cover 
the non-related attributes in staffppc, staffnfo, staffssl, 


sStuaw, Stupllot. and StunGoO- This generalization- 
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specialization structure is also part of a whole structure to 
sStaffcrew or studentcrew. 

To determine the methods, we must examine the 
applications of the database. The primary objective is to 
schedule required training without double = scheduling 
personnel. 

4. Flight Schedule Object Model 
The flight schedule relational diagram is depicted in 


Begure 17 and 18. 
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Figure 17 Flight Schedule Object Model Part 1 
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Figure 18 Flight Schedule Object Model Part 2 
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VI. FRAMEWORK FOR SEMANTIC HETEROGENEITY 


Schematic and data conflicts between databases are a 
crucial problem in building multidatabase systems (Kim, 1991, 
PP- sc). The conflicts are caused by structural and 
representational discrepancies or conflicts between component 
Gatabases. To build a homogenizing layer, a global schema is 
needed. These conflicts must be resolved before constructing 
a global schema. To accomplish this we build a framework for 
enumerating and classifying the types of semantic 
heterogeneity that could exist in the object-oriented database 
model . The scenario developed in Chapter V will be used to 
illustrate the conflicts of the framework. The object model 
used is the model proposed in Chapter IV. 

Semantic conflicts are divided into schema and data 
Contents conflicts (Kim, 199%) pest 2 oe Schema conflicts 
are classified broadly by the level at which they could occur. 
These levels are: object level conflicts, attribute level 
conflicts, object-attribute level conflicts, and object method 
COnELLGES:. The framework covers two peau types of data 
conflicts; inconsistencies and different representations for 
the same data. Each level of the framework is discussed in 
detail. Figure 19 is an overview of the types of conflicts 


that are described. 
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A. Schema Conflicts 
1. Object Level Conflicts 
- Object name conflicts 
- Object structure conflicts 
2. Attribute Level Conflicts 
- Attribute name conflicts 
- Attribute constraint conflicts 
- Attribute structure conflicts 
3. Object-Attribute Level Conflicts 
- Obdject-attribute Structure conflicts 
4. Method Conflicts 
-wMemnoc name Conn blEets 
- Method connection conflicts 
SB  Satvestonr |icts 
1. Inconsistencies 
ec, Different Representations for the Same Data 
- Different expresions 
- Different units 


- Different granularities 


Figure 19 Object Model Semantic Conflicts 
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A. SCHEMA CONFLICTS 


1. Object level conflicts 
Object level conflicts occur when the heterogenous 
databases use different representations for similar objects. 
These can be decomposed into object name conflicts, or object 
SEFUCEULEe Contl Tees, 
a. Object name conflicts 

Object name conflicts are of two types. The first 
is a homonym problem exhibited when the same name is used in 
two databases to denote semantically different objects. The 
second is a synonym problem that occurs when the same name is 
used to denote semantically different objects. The database 
scenario exhibits both. 

The homonym conflict is seen in the following 
example. The Flight Schedule model has an object called 
requirement that refers to a required administrative event. 
The Flight Physiology object-model has an object called 
requirement that refers to required aircrew qualification. 
Though these object have the same name they are not 
semantically related. Figure 20 illustrates the problem. 

The synonym conflict iS seen in the following 
example. The classified library object-model has an object 
call docholder that refers to a person who has custody of a 


document. The NATOPS Department object-model has an object 
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Figure 20 Example of Homonym Object Name Conflict 
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called aircrew that refers to a person who also has custody of 
a document. These two objects are semantically equivalent and 
represent a person who has custody of a document, yet they 


have different names. Figure 21 illustrates the problem. 
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Figure 21 Example of Synonym Object Name Conflict 


al 


b. Object structure conflicts 

There are three types of object structure 
conflicts; generalization-specialization, whole-part, and 
generalization-specialization to whole-part structure 
cont licrs: With these conflicts one must focus on the 
attributes required in the query or application precipitating 
Ehe- Scant Pret = Object method conflicts are treated as a 
special case and addressed later.In the generalization- 
specialization structure conflicts, the attributes of an 
object in one model are contained in a generalization- 
specialization structure of another model. Consider the name, 
rank, and social security number attributes. In the NATOPS 
department model this information is contained in the aircrew 
object. In the flight physiology model this information is 
contained in the servicemember-officer generalization- 
Specialization structure. However, in the aircrew object, 
grade encompasses both rate and rank in the officer and 
enlisted objects of the generalization-specialization 
structure in the flight physiology model. Figure G22 
illustrates this situation. 

In a whole-part structure conflict, the attributes 
of an object in one model are contained in a whole-part 
structure of another model. Consider the attributes name and 
crew number. This information is contained in the aircrew 
object in the NATOPS model, and in the whole-part structure of 


staffcrew object and the studentcrew object in the flight 
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Figure 22 Generalization-Specialization Conflicts 


schedule object model (The generalization-specialization 
structure aircrew-staffPPC, aircrew-staffNFO, etc., does not 
cause an additional conflict due to the concept of inheritance 
of the generalization objects). Figure 23 illustrates this 
Situation. 

In the generalization-specialization to whole-part 
structure conflict the attributes of interest are contained in 
a generalization-specialization structure in one model, anda 
whole-part structure of another model. Consider the attributes 
of name, grade, and crew number where grade is either the rank 
of an officer, or the rate of an enlisted personnel. This 


information is contained the generalization-specialization 
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Figure 23 Whole-Part Structure Conflicts 


structure in the flight physiology model and in a whole-part 
Structure in the flight schedule model. Figure 24 illustrates 
Chis Sileuactaon. 

a the generalization-specialization Eo 


generalization-specialization conflict, the attribute of 
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Firgit Phystology Model 
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Figure 24 Generalization-Specialization to Whole-Part 
Semeucture Conflicts 


interest in one generalization-specialization structure are 
found in a different generalization-specialization structure 
of another object model. Consider the attributes of grade and 


name, were grade is either an officer’s rank or an enlisted’s 


oS 


rate. This information is found in a generalization- 
specialization structure in the flight schedule model as well 
aS a generalization-specialization structure in the flight 
physiology model. Figure 25 illustrates this situation. 

In the whole-part to whole-part conflict, the 
attribute of interest in one whole-part structure are found in 
a different whole-part structure of another object model. Our 
example does not contain an example of this conflict. 
Modifying the NATOPS model so that publication 1S now a whole- 
part structure. The natlib object will contain the name of 
the +NATOPS library (assume we can now have more than one) and 
it has NATOPS position publications (natpub) and crew station 
maintenance manuals (crewman) as parts. The attributes of 
interest are the library name and all the publications 
contained in the libraries. This modified model contains a 
whole object natlib that contains the attribute LibName and 
parts natpub and crewman that contain all the publications. 
The classified library model has a whole object called library 
and two parts. One part is publication which contains all the 
publication names in the library and the others are abstracts 
of document holders. Figure 26 illustrates the modification 


and thevcontlict situation 
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Figure 26 Whole-Part to Whole-Part Structure Conflicts 


2. Attribute Level Conflicts 
Attribute level conflicts occur when the heterogenous 
databases use different delineations to represent similar 
attributes of abstractions. These can be decomposed into 
attribute name conflicts, attribute constraint conflicts or 


attribute structure conflicts. 
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a. Attribute name conflicts 

Attribute name conflicts are of two types. The 
first is a homonym problem exhibited when the same name is 
used to denote semantically different attributes. The second 
is a synonym problem that occurs when the same name is used to 
denote semantically different attributes. The database 
scenario exhibits both. 

The synonym conflict is exhibited in the following 
example. In the classified library model, the attribute 
‘name’ refers to the name of a person. ieeetene wer lagit: 
physiology model, ‘name’ refers to the name of a requirement 
not a person. 

The homonym conflict is seen in the following 
example. In the flight physiology model the attribute name 
for a social security number is ’soc’. In the NATOPS model 


the attribute name for a social security number is ‘Socsec’. 


b. Attribute constraint conflicts 
There are two types of attribute constraint 
conflicts, data type and attribute integrity-constraint 
conflicts. The data type conflicts occur when semantically 
equivalent attributes in different models have different data 
types or data length. In our example the attribute that 
represents a social security number in the classified library 


model, named ‘SS’, is of type numeric. im the flight 
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physiology model the same attribute, named ’soc’, is of type 
character. Similarly, in one model the length of the social 
security field could be 9, while in another it could be 11 to 
accommodate two hyphens. An example is "045-62-3436" vice 
"045623436". 

The attribute integrity-constraint conflict G@a@mm 
Gue to dissimilar definitions of attribute constraints of 
Similar attributes in the different models. In our example, 
in the flight physiology model the allowed values of rank are 
Ens, Ltjg, LT, LCDR, CDR, CAPT, RADM, and VADM. In the flight 
schedule model, the allowed values are Ens, Ltjg, LT, LCDR, 


CDR, and CAPT. 


c. Attribute structure conflicts 

Attribute structure conflicts occur when a group of 
attributes in one model are semantically equivalent to a 
Single or lessor number of attributes in another model. These 
occur when semantically similar objects have a different 
number of attributes. This can be further decomposed into a 
missing attributes conflict, or missing but implicit attribute 
In the missing attribute conflict one Sy eee is missing 
attributes that a semantically equivalent object contains. 
The attribute is truly missing and cannot be deduced. [In our 
example, the flight physiology model has an object called 


requirement. The attributes are type, completiondate, and 
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Soc. The NATOPS department model object test/checkflt has 
type, date, issuedby, socsec, score, and bc as attributes. 
The objects are semantically similar. Requirement.type is 
equal to test/checkflt.type. Requirement.completiondate is 
equivalent test/checkflt.date. Requirement.soc is equivalent 
Bo test/checkflt.socsec. Test/check£lt has the additional 
attributes of issuedby, score, and bc. None of these can be 
deduced in the requirement object. 
iiwEhesmussing but umplveit attribute» conflict, 
attributes in one object are missing, but can be deduced. This 
can be a subtle distinction. Our scenario does not have a 
good example of this. To illustrate we will change the last 
example slightly. The requirement object will remain the 
same. The test/checkflt object will now have the attributes 
type, date, socsec, and name. Name refers to a persons name 
that 1S associated with the socsec (Social security number). 
Now the missing attribute of name can be deduced in the 
requirement object. 
3. Object-Attribute Level Conflicts 
Object-attribute level Cone livers Secur when 
information in one model is reflected by an attribute and by 


an object in another. The database scenario does not have a 


good example of this. For our purposes assume the NATOPS 
model has an object called aircraft. The object aircraft 
contains attributes side number, type, and version. The 
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flight schedule model has an attribute called aircraft that 
identifies the aircraft by side number. The implication is 
that knowing the side number implies type and version. So, the 
information in an attribute in one model is contained in the 
object of another model. 
4. Method Conflicts 
Method conflicts fall into two general types. The 
classes are divided by conflicts that concern methods unique 
to one model, or conflicts that concern the global or 
integrated model. The types of method conflicts exhibited are 
dependant on the type of homogenizing strategy employed. 
a. Method name conflicts 
With methods unique to one model, the primary 
conflict that arises in a global view is a method name 
conflict. This occurs when two heterogeneous models contain 
methods with the same name and the method is being employed 
while exploiting a global schema. In our example, the flight 
physiology model has an method called planning list that works 
in conjunction with the attributes of the requirement, 
enlisted, and officer objects of that model. The NATOPS 
department model also has a method called planning list. It 
was designed to work with the objects of that model. So, in 
an actual or virtual global schema, a conflict would occur if 


both methods are transported to the global schema. 


Tie 


b. Method message and instance connection conflicts 

The second class of method conflicts occurs when a 

method in a heterogeneous model is extended to apply to other 
heterogeneous models in a real or virtual global schema. 
Again the possibility of a naming conflict exists. However, 
the conflicts here would more likely be related to conflicts 
of message connections and/or instance connections. The 
message and instance connections would relate to attributes 
that exist in the local model or view. Extending the method 
to the global model or global schema would entail establishing 
message or instance connections to other heterogeneous models. 
These connections could cause the manifestation of all 


previously mentioned schematic conflicts. 


B. DATA CONFLICTS 
Data conflicts are of two distinct types; inconsistencies, 
or different representations for same data. Data conflicts 
are independent of the schema involved. 
1. Inconsistencies 
Inconsistencies are generally due to failures in 
maintaining a database, such as failing to keep the database 
up to date and failure to enforce integrity rules (Kim, 1991, 
pp.17). The problems with inconsistencies can be expressed as 
data entry errors or obsolete data. 
Data entry errors occur when equivalent attributes in 


different object models, which are expected to have the same 


us 


value, have different values. In our example the attribute 
rank appears in the flight schedule and flight physiology 
models. Rank is part of the same abstraction. It naturally 
follows that an instance of similar objects in the two 
different models should have the same rank. If however in one 
data base, the rank of John Smith is LT, and in the other the 
rank of the same instance is Ltjg, we have a case of wrong 
data in one of the models. If this was due to an entry error, 
it would be classified as an incorrect-data entry. It 
naturally follows that an instance of similar objects in the 
two different models should have the same rank. If however, 
in one data base, the rank of John Smith is LT, and in the 
other the rank of the same instance is Ltjg, we have a case of 
wrong data in one of the models. If John Smith was recently 
promoted to the rank of LT, and this was updated in one model 
and not the other, this would be a case of obsolete data. 
2. Different Representations for the Same Data 
The three aspects of data that lead to its 
representation are expressions, units, and granularity. These 
are the areas of representational conflict we will examine 
further. 
a. Different expressions 
Conflicts in expression can occur when two models 
use the same data, but express it differently. In our example 


the data in rank can be expressed as Ens, Ltjg, LT, LCDR, CDR, 
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or CAPT. This data could also be expressed as 0-1, 0-2, 0-3, 
0-4, 0-5, or 0-6. In the U.S. Navy military rank structure 
these codes are different expressions for the same data. 
Using the same example, ensign, lieutenant junior grade, 
lieutenant, etc., could be spelled out instead of uSing the 
abbreviations. This would be a case of using different words 
or strings for the same data. 
b. Different unit for the same data 

These conflicts arise when two models use different 
units for Similar numeric data. In our example we could 
included an attribute qual duration of type numeric to both 
the NATOPS department model and the flight physiology model. 
In one we could have the numeric represent months, while in 
the other the numeric represent years. So, even if both 
attributes hold the same value they represent different 
Ehaings . 

c. Different granularity 

Conflicts in granularity occur when two models use 
values from the domain of different cardinalities for the same 
Mata (Kim, 1991, pp.17). For example in our scenario the 
NATOPS model has an attribute score. The data type is a 
numeric from range 0.0 to 4.0 reflecting a 4.0 grading scale. 
We can added a semantically equivalent attribute to the 
flight physiology model and make it of an enumerated data type 


of fail, very poor, poor, satisfactory, good, very good, and 


uD 


outstanding. The domains now represent the same data, but use 


different granularity. 


C. CONCLUSION 

In this chapter we attempted to develop a complete 
framework for enumerating and classifying schematic and data 
conflicts in a object-oriented database model. In the next 


chapter several ideas are proposed to resolve these conflicts. 
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VII. PROPOSED SOLUTIONS TO SCHEMATIC AND DATA CONFLICTS 


In this chapter we will consider the problems encountered 
and the feasible solutions for querying the global schema. We 
will focus on integrating the classified library, and the 
NATOPS department databases. Problems encountered when 
adding, deleting, or modifying data ina global schema are not 
addressed. 

To allow for queries to span these two databases a 
federated approach is used. As indicated earlier the first 
step in this approach is to transform the component database 
schemas into equivalent schemas in the object-oriented model. 
This was accomplished in Chapter IV. The second step is to 
examine the component databases in the object-oriented model 
and integrate them into a global schema after identifying and 


resolving the schematic conflicts. 


A. SCHEMA INTEGRATION RESOLUTION 
The first step in this process is conflict identification. 
To aid in identifying the schema and data conflicts we rely on 
the framework developed in Chapter VI. 
1. Object Level Conflict Resolutions 
a. Object structure conflict resolutions 
We begin by examining the library and NATOPS 


department object models for object structure conflicts. The 


Jail 


classified library model, shown in Figure 28, has a 
generalization-specialization structure made up of a 
generalization object ‘documentholder’ and specialization 
objects ‘studentholder’ and ‘staffholder’. This structure is 
semantically equivalent to the NATOPS object ‘aircrew’, shown 
Lie Pigure 27. This is the only object structure coniiian 
present, and is a generalization-specialization conflict. To 
resolve structure conflicts in preparation for integration, we 
remap the structure of the simpler model to a more complex one 


to better match the structure of the complex model, 
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Figure 27 NATOPS Department Object Model 
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Figure 28 Classified Library Object Model 


When examining the two structures, it is obvious 
that the classified library model is the more complex model. 
To remap the NATOPS structure, we rely on the attribute 
‘position’ to distinguish between student and staff personnel. 
This attribute starts with ‘stu’ for student aircrew. For 
example, a staff pilot is entered as pilot for position in the 
NATOPS database while a student pilot is entered as stupilot. 
To develop the structure we use the aircrew object as a 
generalization object, and add staffaircrew and stuaircrew as 
specialization objects. Position and crewnumber attributes 


are moved to the corresponding specialization objects. Once 


es) 


the models are remapped into an equivalent structure, the 
object structure conflicts are resolved. Figure 29 


illustrates the NATOPS data model remapping. 
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Figure 29 NATOPS Department Remapped 


b. Object name conflict resolutions 
The next conflicts to resolve are the object name 
conflicts. When resolving these conflicts we focus on the 
library model and the remapped NATOPS model. The following 
object name conflicts are present: Library.documentholder is 


equivalent to NATOPS.aircrew, Library.studentholder is 


80 


equivalent to NATOPS.stuaircrew, Library.staffholder is 
equivalent to NATOPS.staffaircrew, and Library.document is 
equivalent to NATOPS.publication. These object name conflicts 
are resolved in the global schema by usSing a global object 
that corresponds to each equivalent pair. The mapping to/from 
the local objects is handled by a look-up table accessible by 
the global controller. Information in the look-up tables are 
accessed at run-time by the global controller to direct 
queries to the component databases. The global controller is 
explained in detail in section C. 

For our example, Library.documentholder and 
NATOPS.aircrew correspond to a global generalization object 
named holder. Library.studentholder and NATOPS.stuaircrew 
correspond to a global specialization object called student. 
Library.staffholder and NATOPS.staffaircrew correspond to a 
global specialization object called staff. Library.document 
and NATOPS.publication correspond to a global object called 
document. The Library.library and the NATOPS.test/checkflt 
objects have no semantic equivalents. Thus, they each become 
global objects. 

2. Attribute Level Conflict Resolutions 
With the object level conflicts resolved, we turn our 
attention to the attribute level conflicts. To determine what 


attribute level conflicts are present, we examine the 
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attributes of the semantically equivalent objects. Again, we 
use the conflict framework developed in Chapter IV as a guide. 
a. Attribute name conflict resolutions 

The first apparent problems are attribute name 
Contr ices. Library.documentholder has an attribute called 
‘ss’ and NATOPS.aircrew has an attribute called ‘'Ssocsec’. 
Both refer to a social security number. The attributes 
‘publication’ ah Library.document and ‘’pubname’ in 
NATOPS.document are semantically equivalent, and ’Docnumber’ 
in Library.document and ‘pubnum’ in NATOPS.publication are 
also semantically equivalent. These depict the synonyms 
conflict. They are resolved in the global schema by using a 
global object attribute that corresponds to each equivalent 
pair. The mapping to/from the local object attributes is 
handled by a look-up table accessible by the global 
controller. 

b. Attribute constraint conflict resolutions 

The Library attribute ‘ss’ is defined as a nine 
digit numeric type, and the NATOPS attribute ‘socsec’ is 
defined as a nine place character type. The NATOPS ‘'crewnum’ 
is defined as a four place character while the Library 
‘crewnum’ is defined as a two place character. The attributes 
‘publiagation’ alba Library.document and ‘pubname’ in 
NATOPS.document are semantically equivalent, and ‘pubnum’ in 


publication and ‘docnumber’ in document are semantically 
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equivalent. The types of these equivalent attributes are the 
same, but the length are different. These are all attribute 
constraint conflicts. Again, the resolution of this at the 
query level is provided by a look-up table accessible to the 
Guiobal controller. 
c. Attribute structure conflict resolutions 

The ‘name’ attribute in Library.documentholder is 
semantically equivalent to ‘grade’ plus ‘firstname’ plus 
‘lastname’ in NATOPS.aircrew. This iS an attribute structure 
conflict where a group of attributes in one model are 
semantically equivalent to a single attribute in another 
model. The resolution of this conflict is accomplished at two 
levels. First, at the global schema level, an object 
corresponding to this pair will contain the more detailed 
attribute structure (i.e., grade, lastname, and firstname). 
Second, the global controller uses a look-up table to resolve 
decompose and translate a query to the global schema into 
Subqueries to the corresponding data models schemas. An 
example of an element in a look-up table to resolve this 
conflict is as follows: Library.name = NATOPS.grade + 
NATOPS.firstname + NATOPS.lastname. 

Additionally, the attributes ‘’lockernumber’ and 
‘secret foldernum’ in Library.documentholder are not 
semantically contained ata NATOPS.aircrew. 


Library.studentholder has an additional attribute ’classnum’ 
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and NATOPS.stuaircrew has an additional attribute '’position’. 
They are, however, semantically unrelated. Library.staffholder 
and NATOPS.Staffaircrew are both specialization objects of the 
corresponding Library.documentholder and NATOPS .aircrew 
generalization pair. None of the specialization attributes in 
these specialization objects correspond to each other. These 
are all missing attribute conflicts. To resolve these, the 
attributes in the global schema represent the union of the 
attributes in the local schemas. 
3. Object-Attribute Level Conflict Resolutions 

Our example does not contain any object-attribute 
level: conftli ers: These conflicts are present when an 
attribute in one model corresponds to an object in another 
model. At the global schema level, the solution is to 
transform the attribute into an object. This is similar to 
the structure remapping presented earlier, where by the 
structure of one model is remapped so both models, that will 
be integrated, have similar objects. At run-time, the global 
controller uses look up tables to resolve the conflict between 
the global schema object and the local schema. 

4. Method Conflict Resolutions 

The method conflict we are concerned with, are the 
conflicts that arise when local methods are extended to the 
Global schema. Local methods are designed to act on 


attributes of local objects. To use these methods in a global 
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view, the appropriate message connections must be mapped to 
the appropriate global objects. The resolution of this 
conflict is dependant on the resolution of all other 
GComblicts. To simplify this exercise, we are building a 
global schema without the intention of extending local methods 


to the global schema. 


B. CONSTRUCTING THE GLOBAL SCHEMA 
1. The Global Objects 

The first two candidate global objects are the non- 
equivalent objects. These are the Library.library and the 
NATOPS.test/checkflt objects. In this case, for simplicity 
the global objects maintain the same names and attributes. 

The rest of the global objects come from the 
semantically equivalent pairs. From the Library.documentholder 
and NATOPS.aircrew pair we build a generalization object 
called ‘holder’ and include the attributes grade, firstname, 
lastname, lockernumber, secretfoldernum, and socsec. From the 
Library.studentholder and NATOPS.stuaircrew pair we build a 
Specialization object called ‘student’. This object includes 
the attributes crewnum, classnum, and position. The 
Library.staffholder and NATOPS.staffaircrew pair yields the 
global specialization object ‘staff’ with the attributes 
position, crewnum, safenum, and officenum. The final global 
object comes from the Library.document and NATOPS.publication 


pair. We name this object ‘document’ and give it the 


NS 


attributes name, number, changenum, status, classification, 
and socsec. 
2. The Global Schema Structure 
Figure 30 depicts the global schema when the NATOPS 


and Library models are combined. 
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Figure 30 The Global Schema 


C. THE GLOBAL CONTROLLER 
The global controller was briefly mentioned earlier. This 
is an important component in the federated approach. Le 


maintains the definition of the global schema and acts as a 
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coordinator and translator. When it receives a global query 
from a component database, it translates this query into an 
equivalent query on the global schema. This global schema 
query is decomposed and translated into subqueries that are 
sent to the corresponding local databases for processing. The 
results are collected and any corresponding data content 
conflicts are resolved. These results are then reformatted 
and sent back to the originating component database. 

To illustrate this we look at the following query on the 
global schema from a relational user. The user wants a list 
of title and serial number of all publications checked out to 
a person whose social security number is 046-62-3436. 

The global controller first transforms this into an 
equivalent query on the global schema. The information needed 
to make this change comes from a series of look-up tables. 

The resultant query is then decomposed and translated into 
queries to the component databases. Again, extensive use of 
look-up tables enables this process. In our case we have two 
component queries, one being a query to the library database; 

SELECT Publication, DocNumber 
FROM Document, StudentHolder, StaffHolder 
WHERE SS = 046623436. 
The second is to the NATOPS databases; 
SELECT PubName, PubNum 
FROM Publication, StaffAircrew, StuAircrew 


WHERE Socsec = "046623436". 
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The results of the component queries are collected and any 
corresponding data content conflicts are resolved by the 
Global controller. The results are then reformatted and sent 
back to the requesting site. 

All the conflicts and solutions are captured for use by 
the global controller. Additionally, the global controller 
maintains the definition of the global schema and acts as a 
coordinator and translator. At run-time it attempts to 
resolve the following data conflicts. 

1. Data Inconsistencies Conflict Resolutions 

The first conflict is caused by inconsistencies. 
These are generally due to failures in maintaining a database, 
such as failing to keep the database up to date. The global 
controller may not be able to resolve this conflict. One 
possible solution is to prioritize the component databases. 
If one database has a greater update rate than another, the 
assumption is that the data it contains is the most current. 
This can resolve some conflicts. However, it is not fool 
proof since it cannot correct for entry errors. 

2. Different Representations for the Same Data Conflict 

Resolutions 

Our framework has three categories of this conflict: 
different expressions for the same data, different units for 
the same data, and different granularities for the same data. 


In a global query for an individual’s grade, the local NATOPS 
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model returns ‘LT’ and the Library model returns ‘'0-3'. This 
is an example of different expressions for the same data. 
They both correspond to the U.S. Navy rank of lieutenant. The 
global comptroller must have some means of determining this 
Similarity. Again, a solution is a look-up table. 

The model does not have a different unit data 
conflict. This conflict occurs when two models use different 
Mates £LOr Similar numeric data. To illustrate this conflict, 
we examine two local databases that have an attribute for 
flight time. Querying one database yields the time in 
minutes, and the other yields the time in hours. One solution 
for translating the times is for the global controller to use 
a conversion formula. 

As in the previous case this model does not have 
different granularity data conflict. These conflicts occur 
when two models use values from the domain of different 
Ccardinalities for the same data. To illustrate this, the 
NATOPS object test/checkflt has an attribute score. The data 
entered here is on a scale of 0.0 to 4.0. If another 
component database had a similar attribute where data is 
Smpeered On a scale of 1 to 100. The global controller would 
need a mechanism to translate between the two. Again, 


possible solutions are look-up tables or conversion formulas. 
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D. CONCLUSION 

This chapter proposed a method of resolving the identified 
conflicts between two component databases. To build a global 
schema that encompasses more databases, each successive 
database would be added in similar fashion to the resultant 
global schema of the previous component databases. ial 
essence, building a global database with multiple component 
databases is an iterative process. The principles remain the 
same. However, the overall complexity increases. As this 


happens the importance of the global controller is magnified. 
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VIII. SUMMARY AND CONCLUSIONS 


The framework developed in this thesis provides a 
comprehensive enumeration and classification of schema and 
data conflicts among component databases in an object-oriented 
database model. The schema conflicts are broadly classified 
by the level at which they could occur. These levels are: 
object level conflicts, attribute level conflicts, and object- 
attribute level conflicts. The data conflicts are classified 
as inconsistencies, and different representations for the same 


data. The following is a summary of these conflicts. 


A. SUMMARY OF SCHEMA CONFLICTS 
1. Object Level Conflict Summary 
Object level conflicts occur when the heterogeneous 
databases use different representation for similar objects or 
abstractions. Object level conflicts are decomposed into 
object name conflicts and object structure conflicts. 
2. Attribute Level Conflict Summary 
Attribute level conflicts occur when the heterogeneous 
databases use different delineations to represent similar 
attributes. These conflicts are decomposed into attribute 
name conflicts, attribute constraint conflicts, and attribute 


Smemeuie@eure confElicts. 
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3. Object-Attribute Level Conflict Summary 
Object-attribute level conflicts occur when the same 
information is represented by one or more attributes in one 
model and as an object in another model. 
4. Method Conflict Summary 
The types of method conflicts are dependant on the 
Strategy chosen for defining methods in the global model. One 
Strategy would extend methods at the local schema level to 
apply to the global schema. In this Situation, methods have 
to be rewritten or mapped into the final global schema that 


results from resolving all other conflicts 


B. SUMMARY OF DATA CONFLICTS 
1. Inconsistent Data Conflict Summary 
Inconsistent data 1s generally due to data entry 
errors or failures in maintaining a database. Failures in 
maintaining a database usually manifest themselves in failing 
to keep the database up to date and failures to enforce 
semantic integrity rules. 
2. Different Representations for the Same Data Conflict 
Summary 
The three aspects of data that lead to different 
representation are different expressions for the same data 
(e.g., U.S. Navy, USN.), different units (e.g., inches, feet), 
and different granularity (e.g., a scale from 1 to 4 and a 


scale of 1 to 10). 
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C. APPLICATIONS 

The Department of Defence is in the process of evaluating 
military information systems in regards to the corporate 
information management initiative (CIM). Many of the initial 
problems identified deal with redundant information systems. 
There is a need to access multiple independent information 
systems and to use the contained information for a strategic 
advantage at the department of defence level. 

One solution is to consolidate these systems along lines 
of functionality, and rebuild them from scratch. The goal 
would be to reduce redundancies and foster interoperability 
between the remaining systems. This may not be feasible in 
every situation. An alternate solution iS to organize 
existing systems along the lines of functionality, and then 
homogenize them so that they can share data. This is where 


resolving the heterogeneity conflicts becomes important. 


D. FUTURE RESEARCH 
Applying the framework and proposed solutions to build a 
global schema froma number of related component heterogeneous 
databases is the logical next step. Additional research is 
needed in the following areas. 
1. Prototype Construction 
The conflict framework and proposed solutions could be 


the basis of a prototype for building an information systems 
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homogenizing layer. The development of a workable prototype 
could impact how the Department of Defence proceeds with it’s 
information consolidation efforts. 
2. Development of Tools Based on Framework 
The framework could be the foundation for a set of 
workable conflict identification tools. These tools could 
automate the identification and resolution of semantic and 
data conflicts found in similar databases prior to attempted 
integration. With the conflicts identified, the integration 
process should be significantly shortened. 
3. Construct Artificial Intelligence (AI) Techniques to 
Resolve Semantic Issues 
The conflict framework could be the foundation for an 
AI system that could automate the integration process between 


numerous component databases. 
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